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Abstract. We discuss the numerical solution methods available when solving for 
the steady-state density matrix of a time-independent open quantum optical system, 
where the system operators are expressed in a suitable basis representation as sparse 
matrices. In particular, we focus on the difficulties posed by the non-Hermitian 
structure of the Lindblad super operator, and the numerical techniques designed to 
mitigate these pitfalls. In addition, we introduce a doubly iterative inverse-power 
method that can give reduced memory and runtime requirements in situations where 
other iterative methods are limited due to poor bandwidth and profile reduction. The 
relevant methods are demonstrated on several prototypical quantum optical systems 
where it is found that iterative methods based on iLU factorization using reverse 
Cuthill-Mckee ordering tend to outperform other solution techniques in terms of both 
memory consumption and runtime as the size of the underlying Hilbert space increases. 
For eigenvalue solving, Krylov iterations using the stabilized bi-conjugate gradient 
method outperform generalized minimal residual methods. In contrast, minimal 
residual methods work best for solvers based on direct LU decomposition. This work 
serves as a guide for solving the steady-state density matrix of an arbitrary quantum 
optical system, and points to several avenues of future research that will extend the 
applicability of these classical algorithms in absence of a quantum computer. 
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1. Introduction 

Understanding the interplay between a quantum system and its environment is of 
fundamental importance for realistic quantum systems [1]. For although much care 
is taken experimentally to eliminate the unwanted influence of external interactions, 
there remains, if ever so slight, a coupling between the system of interest and the 
external world. In addition, any measurement performed on the system, classical or 
quantum, necessarily involves coupling to the measurement device, thereby introducing 
an additional source of external influence [2, 3]. By dehnition, an open quantum system 
is coupled to an environment where the complexity of the environmental dynamics 
renders the combined evolution of system plus reservoir intractable. However, for a 
system weakly coupled to its surroundings, there is a clear distinction between the 
system and its environment, allowing for the dynamics of the latter to be traced over, 
resulting in a reduced density matrix describing the system alone. 

Finding the steady-state solution to the reduced density matrix of an open quantum 
system is of great importance for both experimental and theoretical investigations of 
quantum optics and related systems such as trapped ions, superconducting circuits, and 
quantum nanomechanical systems (e.g. see Refs. [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]). 
In addition to being a good approximation to the density matrix for many experimental 
quantum systems, in some devices it is possible to observe quantum features in the 
steady-state density matrix, such as negative Wigner functions [11, 16, 15], under 
experimentally feasible operating conditions that can be repeatedly measured without 
detrimentally affecting the state of the system. Often the steady-state density matrix 
cannot be expressed analytically, and as such, the efficient numerical simulation of 
this operator is a key component in the exploration of a multitude of quantum optical 
systems. 

In absence of a quantum computer, the numerical calculation of the steady- 
state density matrix must be solved using classical computing resources, where the 
fundamental limit on the size of quantum system that can be explored is constrained by 
the exponentially increasing dimensionality of the underlying composite Hilbert space. 
However, when representing the quantum mechanical operators in a chosen basis using 
sparse matrices, the dimensionality of the underlying Hilbert space is currently not the 
limiting factor in determining the steady-state solution for a quantum optical system. 
Rather, it is the difficulties that arise due to the lack of Hermicity and poor conditioning 
of the Liouvillian super operator that lead to large runtimes and memory consumption 
when using sparse linear solution methods. As such, a thorough understanding of the 
available algorithms, and the development of techniques designed to overcome these 
limitations, is crucial to enabling the classical simulation of ever larger quantum systems. 

In this work, we detail the numerical solution methods available when solving for 
the steady-state density matrix for an arbitrary time-independent quantum optical 
system where the operators are represented as sparse matrices. We consider the 
standard eigenvalue and linear system solution methods, along with recently developed 
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iterative techniques and bandwidth reducing permutation methods designed to overcome 
the limitations of these standard approaches [16]. In addition, we put forth a 
doubly-iterative inverse power method, that yields better performance as compared 


to previously considered iterative methods when bandwidth reduction permutations 


are less effective. Here we show that, with the addition of this new doubly-iterative 
algorithm, iterative solution methods, in general, outperform their direct factorization 
counterparts in terms of both algorithm runtime and memory consumption for large 
quantum systems provided that the Liouvillian operator is suitably permuted such 
that the approximate inverse is well-conditioned. Both memory and solution time 
benchmarks can be reduced by an order of magnitude, or more, when using the 
appropriate iterative technique for a given Liouvillian. These methods allow for the 
simulation of ever larger quantum systems, and help to mitigate the need for a quantum 
computer. 

This paper is organized as follows. In Sec. 2 we introduce the eigenvalue equation 
whose solution is the steady-state density matrix for which we must solve. Section 3 
explores the use of eigenvector solution methods, their dependence on LU factorization, 
and the requirement of well-conditioned eigenvalues. In Sec. 4 we make use of the 
unit trace property of the density matrix to recast the eigenvalue problem as a linear 
system of equations, and discuss matrix permutation techniques designed to reduce £ 11 - 
in in the LU decomposition that affects both memory and runtime. Section 5 discusses 
preconditioned iterative solvers and highlights the difficulties in implementing these 
techniques as general solution methods. Numerical simulations of several canonical 
quantum optical systems implementing these solution methods are presented in Sec. 6 . 
Finally, Sec. 7 concludes with a brief discussion of the results and directions for possible 
future research. 

2. Steady-state density matrix 

For an open quantum system with decay rates larger than the corresponding excitation 
rates, the system approaches the steady-state density matrix pss as f —)■ cxd satisfying 
the eigenvalue equation 



( 1 ) 


where C is the Liouvillian super operator. In many quantum optical systems, the 
Liouvillian takes the most general Lindblad form 




( 2 ) 


with the dissipative terms given by 


T^\Ck,p] = \i2apcl - pc'A - cAp], 


( 3 ) 


where the Ck = are collapse operators determined by the rates 7 ^ and 

operators for each dissipation channel that couples the system to its environment or 
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measurement apparatus. This form of the Liouvillian is not required for the analysis 
presented here, and can be substituted with any time-independent Liouvillian. 

If the system Hamiltonian and collapse operators are time-independent, or can be 
transformed into such a form by moving to an interaction representation, then Eq. (1) 
can be recast as a sparse matrix eigenvalue equation 

Cpss = Opss, (4) 

where pss is the dense vector formed by vectorization (column stacking) of pss, and C 
is the sparse matrix representation of the Liouvillian in a given basis representation. In 
what follows, we will always assume a Fock state basis for oscillator modes, and a z-basis 
representation for spin operators. Given the non-Hermitian form of the Liouvillian, the 
solutions to Eq. (4) are non-trivial, and the methods by which this equations can be 
solved is the focus of the remainder of this work. 

3. Eigenvalue Methods 

Let us assume that the eigenvalues of the Lindblad operator C G C"" x C"", formed 
from a composite Hilbert space with dimensionality dim d-L = \/n, can be diagonalized 
with eigenvalues ordered as |Ai| > IA 2 I > ... > |A„|. Using the spectral decomposition 
C = VAV^^ where A = diag(Ai,..., An) and V is the matrix with the eigenvectors of 
jC as the column entries, repeated application of the Liouvillian obeys the relationship 

= VA^V^ (5) 

For the vector xq = Vx G C”, generated from a random vector x, the application 
of the Lindblad operator fc-times can be expressed as 



with (A*/Ai)^ ^ 0 for i > 1 and a sufficiently large k provided that xi 7 ^ 0 , i.e. the 
initial vector has a nonzero component in the direction of the dominate eigenvalue, and 
Ai is well separated from A 2 . The former criterion is automatically met by starting with 
a random vector. Provided that these conditions are satisfied then jC^x is approximately 
parallel to the dominate eigenvector vi with error decreasing linearly at an asymptotic 
rate given by |A 2 |/|Ai|. The power series solution for the dominate eigenvector can 
therefore be written as 

_ Axk _ A^xq 

\\Axk\\ ||A*^foir 

where 11 ■ 11 is a chosen vector norm, e.g. the L 2 -norm. Since any eigenvalue problem is 
equivalent to Ending the roots of the characteristic polynomial, from the Abel-RufEni 
theorem, it is impossible to directly compute the eigenvalues for any matrix with 
dimension > 5 [17]. Power iteration itself is not suitable for finding the steady-state 
density matrix as the zero eigenvalue is not the dominate eigenvalue of jC. Furthermore, 
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if IA 2 I/IA 1 I ~ 1, then the convergence of this method is prohibitively slow. The former of 
these shortcomings can be addressed by modifying the power iteration method, whereas 
the latter is not encountered in practice. 

To transform the power method into an algorithm suitable for our purposes, recall 
that this method relies on the identity given in Eq. (5). Given a function f{z), 
dehned locally via a convergent power series, with the eigenvalues of jC in the radius of 
convergence, then one can dehne / (C) via the same series, and / (C) = Vf (A) 
with / (A) = diag[f(Ai),..., f(An)]. For the specihc choice of f{z) = {z — a)~^ one has 

= V{A-aX)-^V~\ ( 8 ) 

where X is the identity matrix and the shifted-Liouvillian is denoted by ~t,. 

Performing power iteration on Eq. (8) converges to the dominate eigenvalue Aj 
where (Aj — cr)~^ is maximal; the solution vector corresponds to the eigenvalue in the 
complex plane closest to a. This method is known as the Shifted-Inverse Power technique 
[17] where the iterations are given by 



Although Eq. (9) is formally correct, in practice, it is both computationally and memory 
intensive to calculate the inverse of a sparse matrix, as the inverse is in general dense. 
However, the product is equivalent to solving the linear system = x^, and 

thus the inverse-power method requires computing the direct sparse LU factorization 
of the matrix ~E. This factorization need only be done once, with the solution for 
different requiring only computationally efficient forward and backward substitution. 
The convergence of shifted-inverse iteration is always linear in the number of iterations. 
However, the rate of convergence is determined by the accuracy in choosing a and the 
proximity of the dominant eigenvalue to its nearest neighbor. As we know the exact 
value of the desired eigenvalue, <7 = 0, and the dominant eigenvalue is well separated, 
the number of iterations required to reach a given error tolerance is quite small, usually 
requiring only a single iteration to reach convergence for a random input vector. This 
holds true even when converging to machine precision. As such, the computational and 
memory intensive portion of inverse-power iteration lies in the LU factorization of the 
shifted Liouvillian operator. 

While Eq. (4) requires Ending the eigenvector associated with the zero eigenvalue, 
in practice, the choice of a is often nonzero and small. When working with double¬ 
precision numbers, a value of a = 10“^® is typically used. The efiect of this slight 
shift is to guarantee that the matrix has a nonzero main diagonal, while simultaneously 
perturbing the matrix away from being strictly singular. Such a transformation is 
crucial to the successful reduction in memory consumption for sparse LU factorization 
provided by the sparse reordering methods discussed in Sec. 4 where a nonzero diagonal 
is assumed. 
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Given that it is nearly singular, this matrix is extremely ill-conditioned, and one 
would not expect an accurate result for the the steady-state density matrix. However, 
it can readily be shown that the effectiveness of inverse iteration does not depend 
on the condition number of ~t, [17]. Rather it is the condition number of the zero 
eigenvalue and its eigenvector that determines the stability of the solution. For a non¬ 
degenerate eigenvalue, the associated condition number gives a measure of how close to 
degenerate the eigenvalue becomes for a given permutation of the system matrix. Unlike 
Hermitian matrices, where the eigenvalue spectrum is stable against such perturbations 
[17], the eigenvalues of non-symmetric matrices can be arbitrarily ill-conditioned [18], 
and the associated loss of accuracy can give rise to considerable errors in the computed 
eigenvectors. As the convergence rate of inverse power iteration also depends on the 
eigenvalue of interest being well separated, poorly conditioned eigenpairs lead to poor 
performance when using this algorithm. Prior to computing the eigenvalues, applying 
a similarity transform 'DC/D~^^ where X> is a diagonal matrix, that reduces the norm 
of or the condition number of a subset of eigenvalues can be advantageous [19]. The 
possibility of ill-conditioned eigenvalues must be kept in mind when using eigenvalue 
solving methods, but is rarely encountered in practice ]:. Indeed, the inverse power 
method typically converges in a single iteration. Supplementing the numerical solution 
with analytical results, and/or varying parameter values over a given range and looking 
for discontinuous jumps in quantities such as density matrix metrics, for example the 
fidelity or trace distance, and expectation values can help in part help to diagnose the 
presence of an ill-conditioned zero eigenvalue. 

Finally, Eq. (4) can also be solved using algorithms suitable for finding a subset 
of eigenvalues of a sparse matrix such as, for example, ARPACK [21]. This method 
works by building the Krylov subspace formed from repeated applications of the shifted 
Liouvillian on the initial random vector xq 

JCn = span (fo, Axo, A^xq, A"-~^xo} , (10) 

with A = 5 followed by an orthogonalization procedure based on Arnold! 

iteration. For a non-symmetric matrix, building the Krylov subspace requires first 
performing the sparse LU factorization of it and, as in the inverse-power method, this 
factorization represents the bulk of the runtime and memory usage in this method. 

4. Direct factorization 

As an alternative to solving for the zero eigenvalue, it is possible to find a direct (i.e 
non-iterative) solution to pss via sparse LU decomposition [22] by making use of the 
unit trace property of the density matrix to recast the eigenvalue equation as a linear 

f We are unaware of any example where the conditioning of the zero eigenvalue leads to a break down 
in the inverse-power method. However, ill-conditioning of eigenvalues can give spurious results when 
computing the general eigenvalues of a Liouvillian operator [20]. 
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system of equations 





/ Trpss ^ 

Cpss = [C + wT\ Pss = 

0 

7 '"T pSQ 

0 


V ^ ) 


V : / 


where ta is a controllable weighting factor and 7^ is a matrix that enforces Tr (pss) = 1, 
with ones along the upper row in the columns corresponding to the locations of the 
diagonal elements in pss- Although the choice of row for the nonzero values in 'T is 
arbitrary, importantly, this matrix always has a nonzero element in the hnal column. In 
contrast to iterative methods, the eigenvalue condition number has no effect on direct 
factorization, making this method attractive for hnding steady-state solutions to open 
quantum systems. Note that the restriction Tr (pss) = 1 is already included in the 
Liouvillian operator C, and its matrix representation jC. Here, this constraint is used 
to simply add a constant vector to both sides of Eq. (4). 

When performing the LU factorization of a sparse matrix, nonzero elements arise 
in the L and U factors that are not present in the original matrix; the sparse structure 
of LU is not the same as jC [22]. This hll-in, must be minimized in order to reduce both 
the memory requirements for storing the LU factors and the runtime of factorization, 
both of which scale with the number of nonzero matrix elements NNZ [22] . A convenient 
measure of the hll-in is given by the ratio of nonzero elements in the L and U factors to 
those in the original Liouvillian (Lnnz + Unnz)/>^nnz called the hll factor. The hll-in is 
sensitive to the order in which the rows and columns of a sparse matrix are operated on, 
and in particular, to the matrix bandwidth size and prohle [23]. For a non-symmetric 
sparse matrix A = {oij}, one can dehne the upper and lower bandwidths, ‘ub’ and ‘lb’ 
respectively, to be 


ub = maxfi — i); lb = max(i — i) 


( 12 ) 


The total bandwidth B is then the sum B = ub-|-lb-l-l, where the one takes into account 
the main diagonal [24]. Likewise, we dehne the upper prohle up and lower prohle Ip as 

up = Vmax(j-i); Ip = Vmax(i - j), (13) 

aij^O aijT^O 

1 J 

such that the total prohle is P = up -|- Ip. From the dehnition of the bandwidth, we see 
that imposing the trace condition with the matrix 7^ gives an upper bandwidth that 
is equal to the square of the dimensionality of the Hilbert space. Therefore, the total 
bandwidth is bounded below by this value, suggesting that the hll-in for the LU factors 
rises rapidly with system size. Arising from the use of the trace matrix 7^, and not 
the form of the Liouvillian, this bandwidth scaling holds for any system. It is therefore 
advantageous to hnd a matrix permutation that moves the elements of 7^, as well as all 
other elements, toward the diagonal, thereby reducing the overall bandwidth and prohe 
of the modihed Liouvillian. 

Finding the minimal bandwidth of a matrix is an NP-complete problem [25], and 
therefore several heuristics have been developed that attempt to minimize the bandwidth 
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while remaining computationally efficient. One common technique is to permute the 
rows and columns of a symmetric matrix based on the Cuthill-McKee (CM) ordering 
[26]. Taking the structure of a symmetric sparse matrix as the adjacency matrix of a 
graph, CM ordering does a breadth-first search (BFS) of the graph starting with a node 
(row) of lowest degree, where the degree of the node is defined to be the number of 
nonzero elements in the matrix row, and visiting neighboring nodes in each level-set 
in order from lowest to highest degree. This is repeated for each connected component 
of the graph. CM ordering also reduces the profile of the matrix, and it was noticed 
that reversing the CM order, the RCM ordering, gives a superior profile reduction while 
leaving the bandwidth unchanged [23]. Since RCM operates on the structure of a matrix, 
only the locations of nonzero matrix elements, and not their numerical values, are used 
in this reordering. Modified RCM methods, such as the Gibbs-Poole-Stockmeyer (GPS) 
algorithm [27], provide a similar bandwidth and profile reduction. In the Fock basis, 
the effect of RCM reordering is to permute the basis vectors such that the Fock states 
are no longer in ascending order. 

As the Liouvillian operator is itself non-symmetric, we calculate the RCM ordering 
of the symmetrized form C, , and apply the resulting row and column ordering to C, 

- X ~T ~ 

to obtain Z^rcm- One can also used the symmetrized product CC or C C, however this 
would lead to dense matrices and large memory consumption unless done symbollically. 
While RCM reordering of a symmetric matrix is guaranteed to reduce the bandwidth 

~ ~ T 

and profile, or at least do no worse [23], the need to operate on C + C rather than 
C, itself invalidates this assurance. RCM reordering oi C, overestimates the graph 
structure of the original modified Liouvillian, and there is little a priori information 
from which to judge whether RCM will significantly reduce the bandwidth and profile 
when the structure of the Liouvillian operator is varied. This is especially true when the 
structure of C, is far from symmetric, and the two matrix structures differ by a significant 
amount. Structural changes such as setting to zero numerical system parameters in the 
Hamiltonian or collapse operators can also affect the performance of RCM ordering. 

Along with symmetric reorderings that reduce both the bandwidth and profile of the 
Liouvillian, it is possible to reduce the fill-in via non-symmetric permutations that sort 
only the columns, or rows, of the matrix. In the present case of non-symmetric matrices, 
the best performing general purpose reordering strategy is the Column Approximate 
Minimum Degree (COLAMD) ordering [28] that is the default reordering method in 
the SuperLU linear solver used here [29], as well as in many commercial solvers such as 
those found in Matlab [30]. This method finds the symmetric permutation oi C C that, 
when applied as a column permutation to jC, minimizes the worst case fill factor for an 
arbitrary permutation of the matrix rows. In this case, the product C C is computed 
symbolically to avoid the explicit construction of the resulting dense matrix. 

Minimizing the fill-in using COLAMD reordering requires a matrix with a no n zero 
main diagonal. In the case of the inverse-power method, this condition was automatically 
satisfied when applying the eigenvalue shift. In contrast, the diagonal of the modified 
Liouvillian can, in general, contain zero elements that must be permuted away. As 
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the modified Liouvillian is necessarily non-singular, it is always possible to find a non- 
symmetric permutation that makes the diagonal zero-free [22] . A simple method, using 
only the matrix structure, is Maximum Bipartite Matching (MBM) [31]. Here we 
implementation a variation of the MBM algorithm using a BFS method processing 
the columns of the matrix in order starting with the column possessing the largest 
absolute value element. From this initial column, the algorithm proceeds via a BFS 
where the rows are processed in descending order based on the absolute value of each 
row element. The corresponding columns are marked as being visited, and the routine 
continues until all columns have been processed. The goal of this weighted variant of the 
MBM algorithm is to permute the matrix to a zero-free diagonal while simultaneously 
attempting to increase the sum of the diagonal elements. This can help, in part, to 
overcome difficulties in the construction of approximate LU factorizations discussed 
in Sec. 5 [32] . Although the diagonal sum is not guaranteed to be maximized, this 
method is computationally less intensive than maximum product traversal [33] and 
related methods. Note that RCM permutation does not require a non-zero diagonal, 
and in many cases, applying a non-symmetric permutation such as MBM before RCM 
can lead to both a larger bandwidth and profile as the matrix structure becomes less 
symmetric. 


5. Iterative Solvers 


The main drawback in using direct LU decomposition is the poor scaling in terms of 
both runtime and memory requirements, even when symmetric and/or non-symmetric 
reordering methods are employed, as the dimension of the matrix increases. Therefore, 
for sufficiently large sparse matrices, iterative methods are the only available solution 
method [34], with the most common choice being iterative Krylov solvers [24]. While 
iterative methods require less memory and fewer numerical operations than direct 
methods, these methods usually require preconditioning to achieve a sufficient tolerance 
level and a reasonable convergence rate [34]. The goal of preconditioning is to convert 
the original system of equations, Eq. (11), into the modified linear system 


M ^Cpss 




0 

V : / 


(14) 


where A4 is the preconditoner. Convergence is improved provided that the condition 
number of M. is significantly lower than that of C itself. As the condition number 
of a sparse matrix grows with the dimensionality [22] , preconditioning is required when 
solving large sparse systems. The best preconditioner is obviously A4 = C, however 
this is equivalent to solving the original system of equations. Instead, it is possible 
to efficiently solve for an approximation of the inverse A4 ~ C. The application of a 
suitable preconditioner should make the modified linear system Eq. (14) easy to solve, 
while the preconditioner itself should be simple to build and apply as one or more 
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matrix-vector products are required for each iteration. Moreover, the condition number 
of A4 should not be so large as to affect convergence. 

The iLU class of preconditoners are constructed from an incomplete (approximate) 
LU factorization to the modihed Liouvillian jC by discarding hll-in elements based on a 
designated dropping strategy. The method used here is an incomplete LU factorization 
with dual-threshold and pivoting (iLUTP) [24], where a drop tolerance d and allowed 
hll-in p are specihed such that all hll-in smaller than d times the inhnity-norm of a row 
are dropped, and at most only p ■ jCnnz hll-in elements are allowed. Note that these 
parameters are not independent. In the limit where d = 0, the iLU preconditioner 
returns the complete LU factorization, and the hll-in for the direct factorization can be 
viewed as an upper-bound on the size of the preconditioner. The condition number of 
M can vary as a function of the drop tolerance, and therefore decreasing d does not 
necessarily improve the convergence rate [32]. A convenient estimate for the condition 
number of the approximate inverse is given by ||A4e||oo, where e = (1,1,.. ■)^ [32] and 
II ■ I loo is the inhnity-norm. Although this measure gives only a lower bound on ||A4||oo, 
it has proved to be an useful benchmark for the stability of A4 [16]. Finding the best 
combination of parameters for a given matrix is a trial and error process, thus preventing 
the use of these methods as a general purpose solver. 

While the generation of robust iLU preconditioners in the case of an Hermitian 
matrix is now well established, the existence and stability of preconditioners for non- 
symmetric matrices is understood to a lesser extent [34]. In the non-symmetric case, 
iLU preconditioners typically fail due to a lack of diagonal dominance, zeros along the 
diagonal, and inaccuracies in the approximate inverse due to the dropping of small 
nonzero elements [32]. Moreover, even if a preconditioner is found, its condition number 
can be larger than that of the original matrix and hence convergence is lost. However, 
studies have shown that these failures may be overcome by utilizing symmetric and/or 
non-symmetric reorderings of the matrix to maximize the sum of the diagonal elements, 
and reduce the overall bandwidth and prohle [32, 35, 34]. The majority of these 
reordering strategies are developed for matrices with symmetric structure (graphs) and 
therefore their application to non-symmetric problems with differing matrix structures 
must be evaluated on a case by case basis (see Ref. [34] and references therein); there 
is no universally applicable reordering scheme for non-symmetric matrices. However, 
previous studies on symmetric [36] and non-symmetric sparse matrices [35] have found 
that RCM reordering improves iLU conditioning at the expense of a marginal increase in 
hll-in when compared with other reordering strategies. An intuitive explanation for the 
case of symmetric matrices is given in Ref. [36]. In contrast, some of the best methods 
for hll-in reduction such as COLAMD do poorly when applied to iLU factorization [34] . 

Having successfully found a preconditioner for a given drop tolerance d and hll- 
in p, a solution to the steady-state density matrix can, in principle, be found using 
iterative Krylov solution methods [24] designed for non-symmetric matrices. Here, 
we perform the preconditioned iterations using both restarted generalized minimum 
residual (GMRES) [37] and stabilized biconjugate gradient (BiCGSTAB) solvers [38]. 
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The GMRES algorithm constructs the Krylov subspace Eq. (10) for the preconditioned 
system and finds the vector in this subspace with minimal residual, as measured by the 
L2-norm, via Arnoldi iteration. Unlike the symmetric case, where there is a simple three- 
step recurrence relation for finding orthogonal Krylov vectors [39] , the orthogonalization 
procedure must be done explicitly, which is both computationally and memory intensive, 
growing as 0{m^) where m is the iteration count. As such, a restarted variant of 
the original GMRES method is typically used, where the approximate solution vector 
after m-iterations is calculated in the subspace /Cm, where m is much less than the 
dimensionality of the system in question, and then is used as the new initial condition 
for the next m-iterations. An optimal value for m, minimizing excessive work, while still 
leading to convergence is problem specific and must be found through trial-and-error. 
The BiGGSTAB algorithm can be viewed as a biconjugate-gradient step followed by 
a GMRES step with m = 1 giving a local reduction in the residual vector, leading to 
smoother (stabilized) convergence behavior [39]. This method is less computationally 
intensive than GMRES, but comes at the cost of, in general, poorer convergence 
properties. 

Regardless of which method is used, an acceptable tolerance on the L2-norm of 
the calculated residual vector must be supplied as a stopping criterion. In addition, 
one can specify a maximum number of iterations to perform, before convergence is 
assumed to be not possible. Setting strict tolerance values, e.g. 10“^® requires building 
accurate, well conditioned preconditioners and possibly many iterations, thus increasing 
the computation time and memory storage requirements. In contrast, allowing more 
moderate values for the solution tolerance, can give a significant performance benefit 
in some cases. For a given solution tolerance, the optimal values for the preconditioner 
fill factor p and drop tolerance d can vary in a non-trivial manner and must be found 
empirically. 

Here we are interested in using these preconditioned iterative methods in the 
solution for the modified Liouvillian Eq. (11), as well as in the LU factorization of 
~t, used in the inverse power method. Given that the inverse-power method itself is an 
iterative method, we designate this method as the doubly-iterative inverse-power method 
to distinguish it from the usual inverse-power technique using a direct LU factorization 
of ~t,. Although using an iterative solver in the inverse-power method is typically not 
ideal due to the need to solve for a new right-hand vector at each iteration, our method 
relies on the knowledge that only a single inverse-power iteration is typically needed 
when solving the density matrix for a given ~E. Therefore, it is possible to obtain a 
performance improvement provided that the system is large enough, and one forms a 
high-quality preconditioner. In order for this doubly-iterative method to converge, the 
tolerance on the iterative iLU solution vector must be stricter than the tolerance used 
to stop the inverse-power iterations. In what follows, we will always set the tolerance on 
the iterative iLU solver to be an order of magnitude lower than that of the inverse-power 
iterations. 


Steady-state solution methods for open quantum optical systems 

6. Numerical Simulations 


12 


Given the non-Hermitian structure of the Liouvillian operator, the optimal solution 
method for hnding the steady-state density matrix of the system will, in general, be 
problem specihc [34]. Therefore, in order to gauge the general performance of each 
solution method and matrix reordering methods, we apply both eigenvalue and direct 
solvers to the steady-state solution for a set of common quantum optical systems. Note 
that there is an ambiguity when building the sparse matrix representation of a given 
system Liouvillian since the tensor product, and therefore matrix structure, depends on 
the ordering of the operators involved. For concreteness we will specify the operator 
ordering, although the choice ordering does not affect the results presented here. 

First, we consider the canonical driven Jaynes-Cummings model 

— = —Ad'^d + uJadfd- + g {d + a^) (d_ -|- (a -|- a"*") , (15) 

/1 

where A = ujd—ujc is the detuning between the driving frequency Ud and cavity frequency 
Uc- The frequency for the two-level system is given by Wa, while g represents the 
coupling strength between the atomic and cavity modes, and E gives the amplitude 
of the external drive in units of frequency. Note that we have not applied the rotating 
wave approximation to Eq. (15) as we want a numerical solution to the full Hamiltonian. 
In addition, we consider a Liouvillian (2) formed from cavity collapse operators 
^J~Kil^rnfff)d and corresponding to a thermal bath with an average thermal 

occupation number Uth at Uc, and qubit dissipation described by where 

(T_ is the qubit lowering operator. The constants k and 7 characterize the energy decay 
rates for the cavity and qubit, respectively. Here we are interested in the solution as 
the number of cavity states increases, and therefore £x the numerical parameters to be 
(in units of the cavity frequency cUc): A = 0, cUa = 1, = 0.25, E = 1, k = 5 x 10“^, 

7 = 0.05, and rith = 1- The modihed Liouvillian C representing Eq. (15) together with 
the collapse terms, is an example of a system that has zero elements along the diagonal. 
As such, before using COLAMD ordering, we will always apply the weighted variant of 
MBM to hrst permute the diagonal to be zero-free. Here the operator ordering is given 
by d_ = (j_ ® i and a = I ® a 

In addition, we consider a spin-chain consisting of A-spins where the hrst spin is 
being driven externally 



i=2 


n=l 


( 16 ) 


where the frequency of each spin is denoted hy Un, 5 = Ud — ooi is the detuning between 
the drive and resonant frequency for the hrst spin, H represents the driving strength. 
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Figure 1. a-c) Sparse matrix structure of the shifted Liouvillian super operator ~t, 
for the Jaynes-Cummings Hamiltonian (15) with = 16 cavity states using natural, 
RCM, and COLAMD orderings. The RCM bandwidth and profile is 137 and 93329, 
respectively, corresponding to a bandwidth reduction of 1.4 and profile reduction 
of 1.7. d-f) Sparse matrix representation of the shifted Liouvillian for the driven 
spin-chain (16) consisting of 5 spins in natural, RCM, and COLAMD ordering. The 
RCM bandwidth and profile are 167 and 117848, respectively. This corresponds to a 
bandwidth reduction of 6.1, and a profile reduction of 4.9. (g-i) Shifted Liouvillian for 
an optomechanical system (17) comprised of A'c = 4 cavity states and A^m = 8 states in 
natural, RCM, and COLAMD ordering. For RCM ordering, B = 229 and P = 163176 
corresponding to bandwidth and profile reductions of 2.3 and 2.5, respectively. 


and the are the nearest neighbor conpling strengths. For simplicity, we assnme 

each spin has an identical freqnency ojn = 27r, and conpling strengths Jx'^y^z = 0.1 x 27r. 
The driving amplitnde is taken to be O = a;i/2. Finally, we also assnme that each spin 
has a dephasing term given by ■s/ySz, where 7 = 0 . 01 . 

Lastly, we look at an optomechanical Hamiltonian in a frame rotating at the driving 
freqnency ujd [40] 

— = —Ad~^d + uJmb'^b + go{b + b~^)d~^d + E (d a^) , (17) 

n 

where A = Ud — oJci d and b are the annihilation operators for the cavity and mechanical 
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modes, respectively, Um is the mechanical resonance frequency, and go is the single¬ 
photon radiation pressure coupling strength. In addition, we consider the experimentally 
relevant situation where the cavity mode is coupled to a thermal bath near zero 
temperature characterized by the collapse operator -y/^a, while the mechanical mode 
is in a nonzero thermal environment with associated collapse terms ■\/ 7 (l -|- nth)& and 
y/'ynthb'^- The numerical parameters, in units of LOm are: A = 0, 5^0 = 0.4, E = 0.1, 
n = 0.3, 7 = 10“^, and nth = 1, while the operator ordering is a = a ® I and 6 = I 0 6. 
Here we fix the number of cavity states to iVc = 4 while varying the number of states 
Nm for the mechanical resonator. 

Given the reliance of each solution method on LU or iLU factorization, and the 
importance of the sparse matrix structure on the fill-in and stability of these solvers, 
in Fig. 1 we plot the matrix structures (non-zero elements) for the shifted Liouvillian 
it used in the inverse-power method for the three representative systems using natural, 
RCM, and COL AMD ordering. In addition, we give the bandwidth and profile reduction 
factors, and P(jt)/P(jthiCM) respectively, when using RCM ordering. 

In all three cases, the matrix structure is sufficiently close to symmetric such that 
RCM ordering reduces both the bandwidth and prohle for each of the shifted system 
Liouvillian operators. As such, we would expect that LU factorization using this ordering 
results in fewer nonzero elements, and thus outperforms the naturally ordered Liouvillian 
in terms of both runtime and memory consumption. Similar reasoning applies in the 
case of iLU factorization, however the stability of the approximate inverse is also critical, 
and both the structure and numerical values of the elements play a crucial role. The 
comparison to COLAMD reordering is not as straightforward, and must be evaluated 
by performing the numerical factorization. 

In Fig. 2 we present the £11 factors and solution times for each system using the 
inverse-power method, where the factorization of it is performed using LU factorization 
using natural, RCM, or COLAMD ordering, and the doubly-iterative inverse-power 
solver based on iLU factorization employing RCM and COLAMD permutations using 
both GMRES and BiCGSTAB solvers. In addition, the bandwidth and profile reduction 
factors, and in the case of iLU factorization condition estimates for the approximate 
inverse M, are also given. In all cases, we set the solution tolerance for the inverse- 
power solver to 10“^^. All simulations were carried out using the QuTiP framework 
[41, 42, 43], where we have used the default setting for the various solvers save for the 
allowed fill-in p that we set to p = 300 so as to accommodate all of the simulation results 
using a single set of parameters. 

Figure 2 highlights the benefit of both RCM and COLAMD ordering of the shifted 
Liouvillian operator in LU factorization for each system. For the largest Hilbert space 
dimensions considered here, the reduction in fill-in corresponds to a factor of five or 
more savings in terms of the memory requirements for direct LU factorization. iLU 
factorizations reduce these memory requirements even further, by nearly another order 
of magnitude, except in case of the Jaynes-Cummings model, where RCM ordering gives 
a larger fill factor even though iLU factorization should result in fewer nonzero elements. 
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Figure 2. (a) Fill factors for the complete LU decomposition of the shifted Liouvillian 
^ used in the inverse power method for a Jaynes-Cummings system as a function of 
the number of cavity states using RCM (solid-red), COLAMD (solid-blue), and natural 
(black) orderings. iLU factorization fill factors for RCM (dashed-red) and COLAMD 
(dashed-blue) orderings are also presented. Inset figure shows the bandwidth (solid) 
and profile (dashed) reduction factors when using RCM ordering, (b) Fill-factors 
for the driven spin-chain for the same methods used in (a), (c) Fill-factors for the 
shifted optomechanical Liouvillian as a function of the number of mechanical Fock 
states Nm- The natural ordering (black) cannot be computed beyond Nm = 40 
due to memory limitations, (d) Solution times for finding the steady state density 
matrix to a tolerance of 10“^^, using the inverse-power method with RCM (solid- 
red), COLAMD (solid-blue), or natural (black) ordering. In addition, solutions using 
iterative GMRES (solid-dots) or BICGSTAB (dashed) solvers for LU factorization 
based on RCM (red) and COLAMD (blue) orderings are included. The inset shows the 
the approximate condition number of the iLU factorization A4 based on these orderings, 
(e) Solution times for the driven spin-chain model. Here, both the BICGSTAB and 
GMRES iterative methods based on COLAMD ordering fail for A^ = 8 spins due to a 
failure to converge to the desired tolerance in 1000 iterations, (f) Timing results for 
the shifted optomechanical Liouvillian. Memory restrictions limit the solution using 
natural matrix ordering to Nm < 40. 


COLAMD iLU factorization also does no better than the full LU factorizations. This 
suggests that the default values used in the creation of the preconditioner are ill-suited 
for this particular system. In contrast, RCM ordering of the spin-chain Liouvillian 
reduces the memory footprint of factorization by over a factor of twenty compared to 
natural ordering, and a factor of two reduction when compared with COLAMD iLU 
factorization using the same parameters. 

Given that the solution time is proportional to the number of nonzero elements 
in the LU or iLU factors, the solution times for each method are largely in step with 
the £11 factors generated for each reordering. The biggest gains in performance occur 
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Figure 3. (a-c) Sparse matrix structure of the modified Liouvillian C for the Jaynes- 
Cummings model under natural, RCM, and COLAMD ordering. The location of the 
elements for T (red) are given for both the natural and RCM orderings and are enlarged 
for clarity. Under RCM ordering, the modified Liouvillian has bandwidth B = 139 
and profile P = 93300, giving bandwidth and profile reduction factors of 8.0 and 1.7, 
respectively, (d-f) Matrix structure for the driven spin-chain in natural, RCM, and 
COLAMD order. Under RCM ordering, the elements of T have been permuted from 
first to last row giving B = 405 and P = 248489, corresponding to a bandwidth 
reduction of 3.8 and profile reduction factor 2.35. (g-i) The modified optomechanical 
Liouvillian after natural, RCM (B — 239, P = 163299), and COLAMD permutations. 
The RCM bandwidth and profile reduction factors are 5.4 and 2.5, respectively 


for the driven spin chain where LU factorization nsing RCM reordering ontperforms 
COLAMD by nearly a factor of six, and natnral ordering by over an order of magnitnde. 
For iterative solntions, the solntion times when using the BiCGSTAB method are 
generally lower than those using the GMRES solver for each ordering indicating that 
the BiGGSTAB method achieves convergence in a relatively few number of iterations. 
For the driven spin-chain with N = 8 spins, both GMRES and BiGGSTAB iterative 
methods fail to converge to the requested tolerance value when using GOLAMD ordering. 
This lack of converge can be understood by examining the condition number estimate 
for the approximate inverse. Fig. 2e, that suggests that the computed inverse is too 
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ill-conditioned to reach the reqnested level of accuracy. The relatively lower condition 
number achieved using RCM ordering is in line with earlier investigations [16] that 
suggest that the stability provided by this ordering plays a key role in the success of 
iterative methods for open quantum systems. Note that at smaller system sizes, the 
conditioning of the approximate inverse is not as crucial as in the large dimensional case 
as the system Liouvillian itself is better conditioned for relatively smaller Hilbert space 
sizes. 

Turning now to direct solutions to Eq. (14), in Fig. 3 we plot the matrix structure 
for the modihed Liouvillian C for each example system, and in particular, highlight 
the location of the matrix elements corresponding to the trace preserving matrix 'T 
that gives the large upper bandwidth in the naturally ordered Liouvillian operators. In 
addition we give the bandwidth B {C)/ B{jC jicm) and prohle P{jC)/P{jC^cm) reduction 
factors when using RCM ordering for the modihed Liouvillian. For both the Jaynes- 
Cummings and optomechanical modihed Liouvillians, the total bandwidth and prohle 
after RCM permutation are nearly the same as those for the shifted Liouvillian, and the 
introduction of 'T should not greatly ahect the factorization properties of these systems. 
In stark contrast, the bandwidth of the driven spin-chain modihed Liouvillian is nearly 
four times greater than the shifted Liouvillian, Fig. le, as the need to accommodate the 
addition of the trace preserving matrix, and the associated loss of symmetry, reduces the 
ehectiveness of RCM ordering. In this case, we expect the solution time to be markedly 
longer than that of the inverse-power method. 

Finally in Fig. 4 we present the hll-in and solution times for solving the steady- 
state density matrix calculated from the direct LU and iterative iLU factorizations of 
the modihed Liouvillian (14) for the three representative systems. The results for hll- 
in closely parallel those in Fig. 2, however the hll-in for the iLU factorization using 
RCM ordering of the Jaynes-Cummings Liouvillian is more stable when varying the 
number of cavity states. The beneht of direct LU factorization using RCM reordering 
for the driven spin-chain is also reduced due to the larger bandwidth, however it is still 
a factor of two lower than that of COLAMD. The RCM LU and iLU factorization for 
the optomechanical system still perform well due to the negligible increase in bandwidth 
and hll-in compared to those used in the inverse-power method. 

Once again, the solution times when using complete LU factorization are directly 
determined by the hll-in, and are longer than those of the inverse-power method as the 
addition of the matrix 'T increases the total number of nonzero elements in the modihed 
Liouvillian. that, in combination with higher hll-ins, gives an increased runtime. The 
large increase in bandwidth seen for the driven spin-chain in turn leads to longer solution 
times that, for N = 8, are four times longer than the inverse-power methods. In cases 
such as this, it is advantageous to compare the RCM bandwidth for the shifted and 
modihed Liouvillians, and pick the appropriate solution method based on the result. 
Fortunately, calculating the RCM ordering and bandwidth takes only a fraction of the 
total computation time, and can readily be checked before performing the more costly 
factorization step. 
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Figure 4. (a) Fill factors for the complete LU decomposition of the modified 

Liouvillian C for the Jaynes-Cummings system as a function of the number of cavity 
states using RCM (solid-red), COLAMD (solid-blue), and natural (black) orderings. 
iLU factorization fill for RCM (dashed-red) and COLAMD (dashed-blue) orderings are 
also given. The inset gives the bandwidth (solid) and prohle (dashed) reduction factors 
when using RCM ordering, (b) Fill factors for the modified Liouvillian corresponding 
to the driven spin chain as a function of the number of spins for the various factorization 
methods, (c) Fill factors for an optomechanical system with Ac = 4 as the number of 
mechanical states is varied. Here, the natural ordering fill factor becomes larger than 
the available memeory after Nm > 50. (d) Solution times for finding the steady-state 
density matrix to a tolerance of 10“^"*^, using direct LU factorization with RCM (solid- 
red), COLAMD (solid-blue), or natural (black) ordering of C. Solutions using iLU 
factorization based on RCM (red) and COLAMD (blue) for iterative GMRES (solid- 
dots) or BICGSTAB (dashed) solvers are also presented. The inset shows the the 
approximate condition number of the iLU factorization A4 based on these orderings, 
(e) Solution times for the driven spin-chain model. Here, both the BICGSTAB and 
GMRES iterative methods based on COLAMD ordering fail for A = 8 spins due to a 
failure to converge to the desired tolerance in 1000 iterations, (f) Timing results for 
the modified optomechanical Liouvillian. Memory restrictions limit the solution using 
natural matrix ordering to A^ < 50. For all simulations, the weighting factor w in 
Eq. (11) is set to the average of the diagonal elements in the modified Liouvillian. 


For iterative iLU methods, the results presented in Fig. 4 indicate that the preferred 
Krylov solver, in terms of runtime, is now the GMRES method. This is an indication 
that the number of iterations needed to achieve our tolerance goal using the BiCGSTAB 
method, is computationally more intensive than building the m = 20 (the default 
value in QuTiP) Krylov subspace for the GMRES solver and using fewer iterations. 
In addition, the BiGGSTAB method completely fails for the optomechanical system 
as the residual vector computed during the iteration process becomes orthogonal to 
the original residual, leading to a break down in the algorithm [39]. The iterative 
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techniques using iLU factorization based on COLAMD ordering once again fail for 
the driven spin-chain when N = 8 due to the poor conditioning of the approximate 
inverse. Although not seen here, the condition number of A4 also becomes important 
for large dimensional optomechanical systems where iLU factorization was shown to 
fail for COLAMD ordering, but converged to machine precision tolerance when using 
RCM [16]. In contrast, RCM ordering is found to lead to convergence to the requested 
tolerance value for the GMRES method in all cases, and is now the fastest solution 
method for the spin-chain and optomechanical systems at large Hilbert dimensions. 
Moreover, the memory savings obtained from iLU factorization using RCM is close 
to an order of magnitude smaller than full LU factorization using any reordering. In 
contrast, the direct solution of the Jaynes-Cummings model using COLAMD is still the 
optimal method. Here, the dimensionally of the underlying Hilbert space is never large 
enough for iterative methods and matrix permutation to overcome the intrinsically fast 
performance of direct LU factorization on small systems that is also seen in the other 
two systems. 

7. Conclusion 

We have examined the use of available numerical solution methods and matrix reordering 
strategies for solving for the steady-state density matrix for several common time- 
independent systems found in quantum optics and related sub-disciplines. In addition, 
we have introduced a doubly-iterative inverse-power method, making use of the fast, 
usually single step, convergence of the inverse-power method to replace full LU 
factorization with an iLU decomposition. These solution techniques were tested on 
several standard quantum optical systems were it is found that iterative methods based 
on RCM reordering of the system Liouvillian outperform techniques based on direct 
LU factorization for large Hilbert space dimensions. Moreover, iterative solvers using 
COLAMD ordering fail at large dimensions due to poor conditioning of the approximate 
inverse, while those based on RCM remain stable. Provided that the RCM bandwidth 
of the modified Liouvillian jC is nearly the same as that of the shifted Liouvillian then 
preconditioned GMRES solvers using jC should be the hrst choice of iterative method 
as one does not need to worry about eigenvalue conditioning. If instead, bandwidth 
reduction becomes hampered, due to the introduction of the trace preserving matrix 7^, 
then the doubly-iterative inverse-power method can be employed. Regardless of which 
iterative method is used, finding an appropriate drop-tolerance and fill-factor for the 
approximate inverse must still be done on a trial and error basis, although the default 
values used in QuTiP are a good starting point. 

In contrast to iterative methods, a clear understanding of the optimal matrix 
permutation method for direct LU factorization is still lacking. Indeed, there is no a 
priori information by which to judge whether RGM or GOLAMD will be the best choice 
in terms of hll-in, or perhaps if some other as yet unstudied reordering will be most 
efficient. Although these orderings give £11-ins that are usually on par with each other. 
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for large quantum systems, the choice of permutation can mean the difference between 
finding a solution and running into memory limitations. Therefore, understanding how 
the Liouvillian matrix structure affects the fill-in in each of these methods is a critical 
area of future research. Likewise, exploring additional iLU reorderings based on BFS 
(like RCM), or the related depth-first search, may give enhanced bandwidth and profile 
reduction while maintaining good conditioning of the approximate inverse [35, 36]. As 
with the results presented here, this line of endeavors will help to extend the applicability 
of classical solution methods to the steady-state density matrix to ever larger quantum 
systems in absence of a quantum computer. 

All of the numerical algorithms presented in this work are freely available in the 
QuTiP: Quantum Toolbox in Python library [41, 42, 43]. 
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